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Abstract 

In this paper, a new nonlinear identification framework is proposed to address the issue of off-line 
computation of moving-horizon observer estimate. The proposed structure merges the advantages of 
nonlinear approximators with the efficient computation of constrained quadratic programming problems. 
A bound on the estimation error is proposed and the efficiency of the resulting scheme is illustrated 
using two state estimation examples. 

I. INTRODUCTION 

State estimation is a key issue in nonlinear systems control and diagnosis. Algorithms that achieve 
this task are called observers. These algorithms attempt to reconstruct the evolution of the state 
vector by using the only measured (and generally noisy) quantities. As far as nonlinear systems 
are concerned, many observation techniques have been developed during the last four decades. 
This includes high-gain observers 0, sliding-modes observers [15J, Moving-Horizon Observers 
(MHOs) [fTTI and naturally, the widely used Extended-Kalman-Filter (EKF). Excellent reviews 
of nonlinear observer design techniques can be found in lfl4l and J6j. 

Amongst all possible observer design alternatives, MHO technique has witnessed an increasing 
interest these last years because of its ability to handle constraints and to fully exploit precise and 
generally nonlinear models of the dynamic processes under study. This observer requires on-line 
solution of non convex optimization problems in which the cost function is the integral output 
prediction error while the decision variable is the set of unknown quantities to be recovered 
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(state and unknown parameter vectors). 

Despite encouraging recent advances on the issue of real-time computation of MHOs (see 0, 
JH and the references therein for a recent survey on the topics), the highly demanding on-line 
computation involved may question the feasibility of the algorithm when system needing high 
sampling rate are involved or when the use of highly involved optimization software is prohibited 
by real-life context. When the combination of such obstacles and the absence of mathematical 
structure that renders impossible the use of analytical observers, something has to be done in 
order to achieve the estimation task. 

The ideas proposed in this paper follow a suggestion made by [1] aiming at identifying off-line 
the relationship between the sequence of (input/output) measurements and the corresponding 
initial state at the beginning of the moving observation window. By doing so, the cumbersome 
on-line optimization step involved in Moving-Horizon Observers IfTTTl (MHOs) can be avoided. 
Concrete implementation of such an idea using Neural Networks (NNs) identification structure 
has been proposed in 0. As pointed out in 0], NNs structures, like all standard nonlinear 
approximators offer high approximation capabilities at the price of non convex optimization 
schemes which generally suffer from convergence and computation time issues. In the present 
paper, a novel nonlinear identification scheme is proposed, which after a suitable change in the 
decision variables, can be solved using constrained quadratic programming. Such a feature is 
crucial if this optimization problem lies in the inner loop of a whole static game optimization 
formulation (such as the one proposed in [1]) that may be needed to assess the convergence of the 
resulting MHO. More precisely, the solution of the static game is needed to compute the set of 
approximator parameters that achieves sufficiently small maximum error over all possible initial 
states (and initial estimation error). The price to pay in order to gain computational efficiency is a 
theoretical restriction of the class of situations that can be addressed as the proposed identification 
structure possesses no universal approximation property. For this reason, the comparison between 
the two identification schemes is problem-dependent. 

It is important to underline that in the scheme proposed in [UJ, [J2]| a set of non convex optimiza- 
tion problems are solved off-line (using different initial conditions and initial state estimation 
errors) in order to built the learning data for the identification step. Instead, the scheme of the 



present paper avoids the use of such non convex optimization by focusing on the model-based 
state/output relationship that can be discovered using system simulation providing the learning 
data followed by an identification step using specific class of nonlinear structures. This so iden- 
tified function gives the output-related guess of the state which can then be amended by a model 
related term in a rather trivial way in order to recover a kind of classical measurement/model 
confidence trade-off. By doing so, one can avoid the risk of local minima that may corrupt the 
quality of the learning data. 



The paper is organized as follows. First, section [XT] defines the state estimation-related iden- 



tification problem. Section III describes the proposed nonlinear identification setting and gives a 
bound on the state estimation error. The way the learning data used in the identification scheme 
is built is shown in section IV Illustrative examples are given in section |V| while section VI 



summarizes the contribution of the paper and suggests hints for further investigations. 

II. State estimation as an identification problem 
Let us consider a nonlinear system given by: 

x(k + l) = f(x(k),u(k)) ; y(k) = h(x(k),u(k)) (1) 

where x E IR n , u E IR n " and y E M. ny represent the state, the measured input and the measured 
output vectors respectively. The integer k refers to the sampling instant kr for some sampling 
period r > 0. Regardless of the algorithm that may be used to reconstruct the state vector using 
the measured quantities, the implicit assumption that underlines the possibility of state recon- 
struction is that there is an integer N E N and a map T such that the following approximation 
holds: 

x(k)^F(Z(k)) (2) 
where Z{k) is the regressor built up with the past measured quantities according to: 

Z(k) := | ^ I E R N (^+ n v)=^ (3) 

\m j 

where: y(k) := {y T {k), ■■■ , y T {k -N + 1)) T and u(k) := u T {k), ■■■ , u T {k -N + 1)) T . Note 
that in the absence of measurement noise, the implicit definition of the map T involved in ([2]) 



is given by the solution of the following optimization problem: 

F(Z(k)) := X(k, k — N + l,x*, u(k)) (4) 

JV-l 

x* := argmin J(x,Z(k)) := V \\y(k - i) - Y(k - i, k - iV + 1, x, u(k))\\ 2 (5) 

i=0 

where X(j, k — N+1, x, u(k)) [resp. Y(j, k — N+1, x, u(k))] refers to the model-based predicted 
value at instant j of the state [resp. output] when the state at instant k — N + 1 is equal to x and 
when the sequence of controls defined by u(k) is applied on the time interval [k — N + 1, k\. 
It results that if one obtains through off-line computations a good approximation of the map 
F involved in ([2]), then the measurement-related part of the MHOs would be obtained without 
on-line optimization. This is obviously a static identification problem. More precisely, since this 
identification is to be obtained for each component of the state vector one needs to reconstruct, 
the basic generic problem one has to solve is the one consisting of finding a nonlinear map F 
that links a scalar quantity r (a component of the state vector) to a regression vector Z, namely: 

r w F(Z) ; r Gi ; Z eW 11 * (6) 

Remark 1: Note that, as suggested in [1J the cost function involved in (|5]) may contain a 
regularization term \\x — x\\ 2 where x stands for the predicted value of the decision variable based 
on the state space model and the last estimate. This enables a regularization of the estimation 
process and enhance more stability of the iterates. Such regularization can be decoupled from the 
identification process by introducing afterward the following final estimation which represents 
a trade-off between the measurement related estimation and the model related estimation based 
on the previous solution: 

x = XF(Z) + (1 - X)x (7) 

where F(Z) is the measurement-based identified part while x is the model predicted part based 
on the past value of the estimation. In the sequel, we focus on F[Z) term which corresponds 
to the choice A = 1 in Q . 

III. A Nonlinear Identification Framework 

Identification of nonlinear relationships is an open issue. Many frameworks have been proposed 
including neural networks, Wiener-Hammerstein, Volterra Series based formulations flU to cite 



but few possibilities. While offering high approximation capabilities, nonlinear approximators 
need non convex optimization in order to compute the approximator parameters. Such optimiza- 
tion problems suffer from computation time issue and the presence of local minima that may 
prevent the solver from reaching the global minimizers. In this paper, a nonlinear approximator 
is proposed that can be computed through constrained QP formulation at the price of lesser 
universality. 

More precisely, in this paper, attention is focused on nonlinear maps F that takes the following 
form: 

F(Z) := T-\L T Z) ; T(-) strictly increasing (8) 

The existence of T 1 is guaranteed by the strict monotonicity of Y. Note that putting together 
(|6]) and ([8]) enables the identification problem to be re-formulated as the one of finding: 

• the vector L e IR™ 2 and 

• the monotonic increasing function T such that the following approximation holds: 

r(r(Jfc)) « L T Z[k) (9) 

Remark 2: Note that ^ is a nonlinear parameterization as one has to find both and L 
which operate nonlinearly. Moreover, while this structure is adapted to the derivation of efficiently 
solvable constrained QP problems, it is not universal in the sense that any function F(Z) cannot 
necessarily be represented using the structure ([8]). Only the class of functions F for which there 
exists a linear combination of the components of Z that maps to Z through a monotonic function 
is eligible. This property is impossible to check a priori, but the efficiency of the QP problem 
solvers makes it easy to check even for very rich parameterization. If the residual is still too 
high, then the system is surely out of this class since failure cannot result from local minima as 
it is the case in standard nonlinear approximators computation. 

The general form of the l.h.s of ((9]) that is used hereafter is the one given by: 



r(r) 



n b 

B C~ r _T. )1 \ B M))\ ^ = £/j£ 0) (^)) 



(10) 



i=i 

where r min and r max are the minimum and maximum values of r over the learning data (see 
section 



IV) while B(r]) is a function basis that is hereafter defined according to: 



B(j % 1 : ={ 1 } n { B ^TZ ln { B? Y i - 1 



where the number of functions in the set is = 2n m while the functions and B% are 



defined by: 



(0 



'1 + ou 



B. 



1 + «j?7 ~ 1 + (Xi — a(t] 

The coefficients a% are given by a% : = exp(/3(l — z)J — 1. Note that many other function basis 
can be used although the author's experience suggests that the function basis proposed above is 
rather appropriate given the monotonicity character of the targeted nonlinear map. 
Now by combining (|9])-(fT0|) , it follows that the unknowns L and /i may be obtained by solving 
the following least squares problem: 



min \ \\B(r](r))n — Z T L\\ 2 under the constraint ([15]) 



(12) 



where £ is the learning data including a large number of instantiations of the pairs (r, Z). Note 



however that the least squares minimization invoked in (12) has to be done taking into account 
the following constraints: 

1) r must be strictly increasing. This leads to the following inequality constraint on the 
parameter vector fi: 

-dB 



Vv e [o, l] 



drj 



(V) 



H > e 



(13) 



for some a priori chosen lower bound of the derivative e > 0. 

2) The following normalization integral constraint has to be satisfied in order to avoid trivial 
zero values trivial solution (L = 0, /i = 0): 

1 

2 



B(rj)drj 



•/i 



+ 7V 



(14) 



Remark 3: Note that the use of (r min + r max )/2 in the r.h.s of (14) is arbitrary since the 



solution of (12) is defined up to a multiplicative gain. Note however that the r.h.s of (14) is 
inspired by the particular case where a linear function fits the learning data. In this case, T can 
be taken to be the identity map and in this case, the integral has to equal the mean value of r. 



The two sets of constraints (13) and (14) are affine in the decision variable ji. Therefore by 



considering a sequence of values = rji < r] 2 < ■ ■ ■ < r\ q 
approximated by the following matrix inequalities: 



[AineqlfJ, < B 



meq 



[Aeq](J, = B eq 



1, these constraints can be 



(15) 



To summarize, the quadratic function (12) in the decision variables L and // is minimized under 



the linear constraints (15) to obtain the optimal parameters L opt and \i opt - 

Note that there is no loss in generality in taking T strictly increasing since it suffices to take L 

of opposite sign to make ([9]) valid for a strictly decreasing map. 



The following result on the estimation error is straightforward: 

Proposition 1: If Let X C W 1 be a subset of the state space to which belong all state of 
interest. The following conditions are satisfied: 

1) The admissible control sequences are bounded (i.e. u E U) and lead for any x^ E X to a 
bounded sequence of output (y E Y) 

2) there is an upper bound e x > on the identification residual: 

sup ||s (0) - T(Z(x {0 \u))\\ < e x (16) 

3) For any initial state x^°\ the combined effect of noise and model mismatches on the regressor 
Z is bounded according to: 

\\Z Teel (x^,u)-Z(x^,u)\\ 00 < 7 (17) 

where Z reel denotes the real measurement matrix (which is not obtained by simulation) that may 
be obtained on the real system starting from x^ and under the control sequence u. 



then the estimation error is of the form: 

\\x-x\\<e x + 0(i) (18) 
Proof. First of all, note that assumption 2) implies that all nominal measurement regressors Z 



of interest belong to some compact set Z. Using the assumptions (16)-(17), one clearly has: 



U(o) _ £(o)| 



^ (0) -^reel(* ( V)) 



< \\x 



(0) 



< e x + max 



F(Z)\\ + \\F(Z)-F(z Ieel ) 



dZ 



(Z) 



z-z. 



reel I 



< s x + Vn;-M( 7 ) -7 



where Z := Z + 5(0,7) an d where M(/y) is the maximum value of the continuous (by 

construction) map — — over the compact set Z. □ 

oZ 



Remark 4: Note that the upper bound e x involved in (16) can be used as a cost function 
e x (N, n m , f3) to be minimized in the decision variable (N,n m ,(3) which are the parameters of 
the approximator. This obviously results in a static game in which the identification step appears 
in the inner loop. This strengthens the relevance of having easy to solve identification problem 
through the constrained QP formulation. Note also that the value of e x reflects to which extent 
the map linking the regressor Z and the initial state is far from the set of maps described by the 
structure ([8]). This is because no identification error can be affected to the optimization process 
as the underlying problem is a quadratic programming one. 

IV. BUILDING THE LEARNING DATA 

Assume without loss of generality that one is focused on the identification problem associated 
to the estimation of r = X{ for some % E {1, . . . ,n}. Let us also assume that the set of relevant 
values of the state vector x is contained in some hypercube, namely: 

ieX:=nf =1 [if m ,iH (19) 



The learning data set S involved in the definition of the least squares problem (12) is obtained 
through the following steps: 

1) First, a set of initial states {a:^}™-! * s chosen. This can be obtained using uniform grid on 
each interval {x™ m ,x™ ax } of possible values of the z-th component of the state. 

2) A set of n g control profiles {tt^} also generated. Each profile 
defines a control sequence over N s > N sampling periods. 

3) For each pair (xq',u^), the system is simulated over N s sampling periods in order to generate 
the corresponding state profiles: 

{X(k,0,x%\u®) , ke{l,...,N s }Y^ (20) 

4) The data described above enables, for each pair (j, k) G {1, . . . , n g } x {N, . . . , iV s } to obtain: 

• two sequences y^\k), u^\k) that defines Z^\k) according to 

• the corresponding value of r^\k) according to 

r u \k) := X (N, 0, X(k - N, a#' J , u u) ), u {j) ^ 



Finally, the learning data set £ involved in the definition of the least squares problem ( fT2| ) is 
given by: 

£ : = f(r®(k),ZU\k))\ (21) 

L > (j,k)e{l,-,n g }x{N,...,N 3 } 

which is obviously a discrete set of cardinality % that is given by: 

n E := n g -(N s -N + 1) (22) 

V. ILLUSTRATIVE EXAMPLES 

In this section, illustrative examples are proposed in order to give concrete instantiations of the 
different steps of the proposed framework. 

A. Example 1 

Let us consider the famous Van der Pol oscillator which is governed by: 

±i = -x 2 ; x 2 = Ax i + (1 - x\)x 2 ; y:=xi + x 2 + w (23) 

where to is a measurement noise assumed here to be white, Gaussian with variance a = 0.2. 
Typical behavior of the resulting noisy measurement is shown in Figure [T] (left subplot) together 
with typical corresponding level of the noise (right subplot). The basic sampling period is taken 
equal to r = 0.1 sec. The bounds on the state components leading to the definition of the subset 
X are given by X := [—2, +2] x [—5, +5] which obviously enhances the nonlinear character of 
the resulting identification problem (as 2 2 is not negligible when compared to 1 in the expression 
of x 2 ). The learning data has been defined using a uniform grid containing n g = 4 2 = 16 initial 
states. For each state, the system is simulated during N s = 100 sampling period (10 sec) which, 



according to (22) leads to a learning data £ of cardinality n E = 16 x (100 — 10 + 1) = 1456. 
Note that since there are two states x\ and x 2 to be estimated, two identification problems are 
solved and two nonlinear maps and are obtained for the reconstruction of these two 
states according to Xi(k) = F^ l '(Z(k)), i E {1,2} The identification parameters N = 10 and 
n m = 5 are used which leads to an observation horizon of N x r = 1 sec and a functional basis 
containing n& = 10 elements. 

The quality of the resulting matching between the identified and the true values are shown in 
the upper subplots of Figure [2] (left plots). Note that the identified values are obtained using 
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Fig. 1. Example 1. Typical behavior of the measured output and the corresponding noise. The Gaussian noise used in {23} is 
generated with a variance a = 0.2 

an intentionally noised simulation data which enables one to inject the noise already in the 
identification process resolving the trade-off at this early stage. The lower subplots of Figure [2] 
(Left) shows the gradient of the resulting nonlinear functions and r^ 2 ^ that are involved in 
([8]) for the two components of the state respectively. One may note the highly nonlinear character 
of the map in particular. An estimation scenario is shown in Figure [2] (right plots) where 
the initial state of the system and the observer are respectively given by x(0) = (2,0.5) T and 
£(0) = (1,0. 7) T and where a new generation of the measurement noise is used (different from 
the ones used to construct the learning data set). Note that the correction of the observer begins 
only when data is obtained that covers the observer horizon length. This means that the first 
correction occurs at t = (N — l)r = 0.9 sec. It is worth emphasizing here that the observer 
design is done using only the output measurement related correction in order to concentrate 
on the contribution of the present paper. It goes without saying that a balanced estimation in 
which the output-related estimation and the state equation related estimation can be implemented 
following @ or more generally the standard ideas of JU, lfP3l and the references therein. 

B. Example 2 

Let us consider the problem of estimating the state of a dynamic model describing the Escherichia 
Coli Strain. Many knowledge-based model derivation attempts have been investigated in order to 



Estimation of xi 




Fig. 2. Example 1. (Left): A picture showing the results of the nonlinear identification procedure using a learning set of 
cardinality tij = 1456. The upper plots compares the estimated states based on the identified maps f^ 1 ' and while the 
lower plots show the gradient of the corresponding monotonic functions r' 1 ' and r' 2 ' (Right) : Typical results of the state 
estimation using a purely output-related estimation. The observer begins once the first TV = 10 first measurements are acquired 
(first correction at t = 0.9 sec). This estimation is obtained while a Gaussian measurement noise w of variance a = 0.2 is 
injected (see figure [TJ for a typical behavior of the noise) 



better understand the mechanisms that underline the evolution of the population @, iTTOl or to 
develop model-based state observers lfl2l . The dynamic model that is commonly used in deriving 
dynamic state estimation involves the E. Coli strain X that grows on the limiting substrate S 
while yielding a final intracellular product: the /3-galactosidae P. The model is given by: 

X = fj,(S)X -k d exp(-^)X (24) 
S = -y.rtS)X - k m X (25) 
P = y p fi(S)j^-X - k d exp(-^)P (26) 



where /x is the growth rate that is modelled using classical Monod-type relation such as (jl(S) : = 
/; 

where \i m is the maximum specific growth rate for the cell growth (in h ). k s is the 



k s + S 

half saturation constant; k p and k d are constants involved in the Arrhenius-type death kinetic that 
depends on P. k m is a maintenance rate that describes the energy required for normal upkeep and 



repair. y s , y t [used in the measurement equation (27) below] and y p are identified coefficients. 
I stands for the arabinose inducer that is assumed to be constant (no degradation). The output 
measurement vector is given by y := (X + w\, L + w 2 ) T where L is the light produced by the 



bioluminescence that is linked to the state variables by the following expressions: 

L = yi -^S)-^—XP (27) 

1 + h 

while w\ and w 2 are measurement noises that are taken here white, Gaussian and of variances 
o\ and cr 2 respectively. The values of the model parameters used in the sequel can be found in 

ma. 

For this example, the framework described above can be used to construct a reduced observer. 
More precisely, it is shown hereafter that there is a satisfactory solution to the underlined 
identification problem for any learning set that is constructed using an admissible set of ini- 
tial conditions of the form X(X ) := {-^0} x [0, 5] x [0, 0.3] with the following parameters 
r = 0.2 ; N s = 40 ; N = 3 ; n m = 5 ; n g = 25. This means that for each initial value X of the 
E. Coli strain, n g = 25 simulations of the system with different initial states (sharing all the same 
value X and different values of P an d So) is simulated during 8 time units (= N s r = 40 x 0.2) 
hence generating a learning set of cardinality he given by % = 25 x (40 — 3) = 925. Note that 
two identification problems are to be defined and solved using the following definition of the 
quantities r x and r 2 to be identified: 

n := P ; r 2 = fi(S)-X (28) 

Note that if r\ and r 2 are well estimated then fi(S) can also be well estimated since X is 
assumed to be measured (reduced observer). Note also that since only fi(S) is involved in the 
system equation, S is necessarily estimated through yu(S). 

The identification results are shown on Figure [3] for two different values of X G {0.5,2}. One 
can appreciate that a good match between the estimated fj and the simulated for i = 1, 2 is 
obtained over the 925 learning set while using a rather economic parametrization (N = 3 and 
n m = 5). This clearly shows that the proposed methodology enables us to perform the estimation 
scheme provided that the initial value of the state component X is measured at the beginning 
of the batch. 

Figure [4] shows typical behavior of the output-based state estimation that used the two nonlinear 
maps identified above. Note that the noises W\ and w 2 that affect the measured signals used in 
the construction of the regressor are given by <j\ — 0.1 and cr 2 = 20. This leads to a noise level 
that can be observed in Figure [5j 
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(a) Identification results for X(0) = 0.5 (b) Identification results for X(0) = 2.0 

Fig. 3. Example 2. Examples of Identification results for ri = P and T2 = fJ*(S) ■ X for two different values of the E. Coli 
strain X Q £ {0.5,2}. 
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Fig. 4. Example 2. Behavior of the output-based estimation of 
n = P and T2 = l-i(S) ■ X. The measurement noise variances 
are respectively given by cri = 0.1 and 02 = 20. Typical 
behavior of the measurement noise can be observed oin Figure 



Fig. 5. Example 2: Typical behavior of the measurement noise 
used in the simulation of the state estimation depicted in Figure 

El 



VI. Conclusion and future work 

In this paper, a nonlinear approximator has been proposed for a class of nonlinear relationships 
and has been applied in the context of moving-horizon observer design. The proposed scheme 
offers the advantage of requiring only a constrained QP problem solution and can therefore be 



efficiently integrated in the inner loop of a global scheme aiming at optimizing the approximator 
parameters. 

A potential research line is to investigate a systematic computation of the optimal triplet (N, r, (3) 
defining the nonlinear approximator. Indeed, a convenient {optimal) choice of these parameters 
is intimately linked to the noise level as well as the uncertainty structure. A good knowledge of 
the latter is crucial to obtain pertinent choice of these parameters which had been found in this 
paper by trial and error approach. 

Another research track concerns the use of sparse identification techniques in order to derive low 
dimensional parameter vector. This can be greatly facilitated by the availability of the Lagrange 
multipliers of the QP problem that underline the identification step. 
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